Here is a very short NASM program that displays "Hello, World" on a line then exits. Like most programs on this page, you link it with a C library:
| asm/nasm/Win32/helloworld.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
; ---------------------------------------------------------------------------- ; helloworld.asm ; ; This is a Win32 console program that writes "Hello, World" on one line and ; then exits. It needs to be linked with a C library. ; ---------------------------------------------------------------------------- global _main extern _printf section .text _main: push dword message call _printf add esp, 4 ret message: db 'Hello, World', 10, 0 |
To assemble, link and run this program under Windows:
nasm -fwin32 helloworld.asm
gcc helloworld.obj
a
Under Linux, you'll need to remove the leading underscores from function names, and execute
nasm -felf helloworld.asm
gcc helloworld.o
./a.out
If you are writing assembly language functions that will link with C, and you're using gcc, you must obey the gcc calling conventions. These are:
This program prints the first few fibonacci numbers, illustrating how registers have to be saved and restored:
| asm/nasm/Win32/fib.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
; ---------------------------------------------------------------------------- ; fib.asm ; ; This is a Win32 console program that writes the first 40 Fibonacci numbers. ; It needs to be linked with a C library. ; ---------------------------------------------------------------------------- global _main extern _printf section .text _main: push ebx ; we have to save this since we use it mov ecx, 40 ; ecx will countdown from 40 to 0 xor eax, eax ; eax will hold the current number xor ebx, ebx ; ebx will hold the next number inc ebx ; ebx is originally 1 print: ; We need to call printf, but we are using eax, ebx, and ecx. printf ; may destroy eax and ecx so we will save these before the call and ; restore them afterwards. push eax push ecx push eax push dword format call _printf add esp, 8 pop ecx pop eax mov edx, eax ; save the current number mov eax, ebx ; next number is now current add ebx, edx ; get the new next number dec ecx ; count down jnz print ; if not done counting, do some more pop ebx ; restore ebx before returning ret format: db '%10d', 0 |
This program is just a simple function that takes in three integer parameters and returns the maximum value. It shows that the parameters will be at [esp+4], [esp+8] and [esp+12], and that the value gets returned in eax.
| asm/nasm/Win32/maxofthree.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
; ---------------------------------------------------------------------------- ; maxofthree.asm ; ; NASM implementation of a function that returns the maximum value of its ; three integer parameters. The function has prototype: ; ; int maxofthree(int x, int y, int z) ; ; Note that only eax, ecx, and edx were used so no registers had to be saved ; and restored. ; ---------------------------------------------------------------------------- global _maxofthree section .text _maxofthree: mov eax, [esp+4] mov ecx, [esp+8] mov edx, [esp+12] cmp eax, ecx cmovl eax, ecx cmp eax, edx cmovl eax, edx ret |
Here is a C program that calls the assembly language function.
| asm/nasm/Win32/callmaxofthree.c | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
/*
* callmaxofthree.c
*
* Illustrates how to call the maxofthree function we wrote in assembly
* language.
*/
#include <stdio.h>
int maxofthree(int, int, int);
int main() {
printf("%d\n", maxofthree(1, -4, -7));
printf("%d\n", maxofthree(2, -6, 1));
printf("%d\n", maxofthree(2, 3, 1));
printf("%d\n", maxofthree(-2, 4, 3));
printf("%d\n", maxofthree(2, -6, 5));
printf("%d\n", maxofthree(2, 4, 6));
return 0;
}
|
To assemble, link and run this two-part program (on Windows):
nasm -fwin32 maxofthree.asm
gcc callmaxofthree.c maxofthree.obj
a
You know that in C, main is just a plain old function, and it has a couple parameters of its own:
int main(int argc, char** argv)
Here is a program that uses this fact to simply echo the commandline arguments to a program, one per line:
| asm/nasm/Win32/echo.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
; ---------------------------------------------------------------------------- ; echo.asm ; ; NASM implementation of a program that displays its commandline arguments, ; one per line. ; ---------------------------------------------------------------------------- global _main extern _printf section .text _main: mov ecx, [esp+4] ; argc mov edx, [esp+8] ; argv top: push ecx ; save registers that printf wastes push edx push dword [edx] ; the argument string to display push dword format ; the format string call _printf add esp, 8 ; remove the two parameters pop edx ; restore registers printf used pop ecx add edx, 4 ; point to next argument dec ecx ; count down jnz top ; if not done counting keep going ret format: db '%s', 10, 0 |
Note that as far as the C Library is concerned, command line arguments are always strings. If you want to treat them as integers, call atoi. Here's a neat program to compute xy.
| asm/nasm/Win32/power.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
; ----------------------------------------------------------------------------
; power.asm
;
; Command line application to compute x^y
; Syntax: power x y
; x and y are integers
; ----------------------------------------------------------------------------
global _main
extern _atoi
extern _printf
section .text
_main:
push ebx ; save the registers that must be saved
push esi
push edi
mov eax, [esp+16] ; argc (it's not at [esp+4] now :-))
cmp eax, 3 ; must have exactly two arguments
jne error1
mov ebx, [esp+20] ; argv
push dword [ebx+4] ; argv[1]
call _atoi
add esp, 4
mov esi, eax ; x in esi
push dword [ebx+8]
call _atoi ; argv[2]
add esp, 4
cmp eax, 0
jl error2
mov edi, eax ; y in edi
mov eax, 1 ; start with answer = 1
check:
test edi, edi ; we're counting y downto 0
jz gotit ; done
imul eax, esi ; multiply in another x
dec edi
jmp check
gotit: ; print report on success
push eax
push dword answer
call _printf
add esp, 8
jmp done
error1: ; print error message
push dword badArgumentCount
call _printf
add esp, 4
jmp done
error2: ; print error message
push dword negativeExponent
call _printf
add esp, 4
done: ; restore saved registers
pop edi
pop esi
pop ebx
ret
answer:
db '%d', 10, 0
badArgumentCount:
db 'Requires exactly two arguments', 10, 0
negativeExponent:
db 'The exponent may not be negative', 10, 0 |
Here is an example that uses only two floating point instructions, fldz and fadd.
| asm/nasm/Win32/sum.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
; ---------------------------------------------------------------------------- ; sum.asm ; ; NASM implementation of a function that returns the sum of all the elements ; in a floating-point array. The function has prototype: ; ; double sum(double[] array, int length) ; ---------------------------------------------------------------------------- global _sum section .text _sum: mov edx, [esp+4] ; address of argument mov ecx, [esp+8] ; length of array fldz ; initialize the sum to 0 cmp ecx, 0 ; guard against non-positive lengths! jle done next: fadd qword [edx] ; add in the current array element add edx, 8 ; move to next array element dec ecx ; count down jnz next ; if not done counting, continue done: ret ; return value already in st0 |
The text section is read-only on most operating systems, so you might find the need for a data section. On most operating systems, the data section is only for initialized data, and you have a special .bss section for uninitialized data. Here is a program that averages the command line arguments, expected to be integers, and displays the result as a floating point number. Note that there is no instruction to push an 8-byte value, so we fake it by manipulating esp.
| asm/nasm/Win32/average.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
; ---------------------------------------------------------------------------- ; average.asm ; ; NASM implementation of a program that treats all its command line arguments ; as integers, as displays their average as a floating point number. This ; program uses a data section to store intermediate results, not that it has ; to, but only to illustrate how data sections are used. ; ---------------------------------------------------------------------------- global _main extern _printf extern _atoi section .text _main: mov ecx, [esp+4] ; argc dec ecx ; don't count program name jz nothingToAverage mov [count], ecx ; save number of real arguments mov edx, [esp+8] ; argv accumulate: push ecx ; save values across call to atoi push edx push dword [edx+ecx*4] ; argv[ecx] call _atoi ; now eax has the int value of arg add esp, 4 pop edx ; restore registers after atoi call pop ecx add [sum], eax ; accumulate sum as we go dec ecx jnz accumulate ; more arguments? average: fild dword [sum] fild dword [count] fdivp st1, st0 ; sum / count sub esp, 8 ; make room for quotient on stack fstp qword [esp] ; "push" quotient push dword format ; push format string call _printf add esp, 12 ; 4 bytes format, 8 bytes number ret nothingToAverage: push dword error call _printf add esp, 4 ret section .data count: dd 0 sum: dd 0 format: db '%.15f', 10, 0 error: db 'There are no command line arguments to average', 10, 0 |
Perhaps surprisingly, there's nothing out of the ordinary required to implement recursive functions. You push parameters on the stack, after all! Here's an example. In C
int factorial(int n) {
return (n <= 1) ? 1 : n * factorial(n-1);
}
In assembly language:
| asm/nasm/Win32/factorial.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
; ----------------------------------------------------------------------------
; factorial.asm
;
; Illustration of a recursive function.
; ----------------------------------------------------------------------------
global _factorial
section .text
_factorial:
mov eax, [esp+4] ; n
cmp eax, 1 ; n <= 1
jnle L1 ; if not, go do a recursive call
mov eax, 1 ; otherwise return 1
jmp L2
L1:
dec eax ; n-1
push eax ; push argument
call _factorial ; do the call, result goes in eax
add esp, 4 ; get rid of argument
imul eax, [esp+4] ; n * factorial(n-1)
L2:
ret
|
The 64-bit MMX registers can do eight byte operations in parallel, or four (16-bit) word operations in parallel, or two (32-bit) doubleword operations in parallel. The 128-bit XMMs can do 16 byte, 8 word, or 4 doubleword operations in parallel, and do parallel floating-point computations too (4 single precision or 2 double precision). Here is a simple function that sums two arrays of 16-bit short ints, four at a time:
| asm/nasm/Win32/mmxarrayadd.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
; ---------------------------------------------------------------------------- ; mmxarrayadd.asm ; ; NASM implementation of a function that adds two short arrays. ; ; void add(short a[], short b[], int n) ; ---------------------------------------------------------------------------- global _add section .text _add: push ebx ; callee save register mov eax, [esp+8] ; eax points to a mov edx, [esp+12] ; edx points to b mov ecx, [esp+16] ; ecx <- number of items in each array or ecx, ecx ; guard against negative lengths js L4 L1: cmp ecx, 4 ; Less than 4 items left? jl L2 ; if so, handle them individually movq mm0, qword [eax] ; Get four items from a paddw mm0, qword [edx] ; Add them with next four items from b movq qword [eax], mm0 ; Write them back to a add eax, 8 ; Advance a to point to next 4 words add edx, 8 ; Advance b to point to next 4 words sub ecx, 4 ; We've just handled four jmp L1 L2: jecxz L4 ; Are there zero items left? L3: mov bx, word [eax] ; One word at a time addition add bx, word [edx] mov word [eax], bx inc eax inc eax inc edx inc edx dec ecx jnz L3 L4: pop ebx ret |
Here's another one
| asm/nasm/Win32/sseexample.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
; ---------------------------------------------------------------------------- ; sseexample.asm ; ; This program demonstrates a few SSE instructions, for no particular reason ; other than to show them off. ; ---------------------------------------------------------------------------- extern _printf global _main section .text _main: push esi ; callee save register ; Illustrate packed square root computations movups xmm3, [x] sqrtps xmm0, xmm3 movups [y], xmm0 call printall ; Illustrate packed maximums movups xmm2, [x] movups xmm5, [z] maxps xmm2, xmm5 movups [y], xmm2 call printall ; Done pop esi ret printall: mov esi, 4 printone: ; Note printf will NOT ACCEPT single precision floats. ; We have to convert them to double precision floats. Sigh. fld dword [y-4+esi*4] sub esp, 8 fstp qword [esp] push dword format call _printf add esp, 12 dec esi jnz printone ret section .data align 16 x dd 10.0 dd 100.0 dd 400.0 dd 653.2664 y dd 0.0 dd 0.0 dd 0.0 dd 0.0 z dd 5.0 dd 900.0 dd 316.20 dd 111.0 format db '%15.7f', 10, 0 |
This program illustrates saturated addition.
| asm/nasm/Win32/satexample.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
; ---------------------------------------------------------------------------- ; satexample.asm ; ; This is a short example of parallel saturated addition using paddsw. ; It takes two 64-bit quantities ; ; 80008FFF0005FEF2 ; 800020E07FFE99AA ; ; and performs saturated addition on the four 16-bit blocks in parallel, ; then writes the resulting value, in hex, to standard output. The answer ; should be ; ; 8000B0DF7FFF989C ; ---------------------------------------------------------------------------- extern _printf global _main section .text _main: movq mm0, [x] paddsw mm0, [y] ; Do 4 saturated additions in parallel movq [x], mm0 push dword [x] ; can't push 64 bits at once push dword [x+4] ; nor does printf handle 64-bit ints push dword format call _printf add esp, 12 ret section .data x dw 0fef2h, 0005h, 8fffh, 8000h y dw 099aah, 7ffeh, 20e0h, 8000h format db '%0x%0x', 10, 0 |
You probably the OpenGL graphics library already on your system, so why not call it from an assembly language program:
| asm/nasm/Win32/triangle.asm | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
; ---------------------------------------------------------------------------- ; triangle.asm ; ; A very simple *Windows* OpenGL application using the GLUT library. It ; draws a nicely colored triangle in a top-level application window. One ; interesting thing is that the Windows GL and GLUT functions do NOT use the ; C calling convention; instead they use the "stdcall" convention which is ; like C except that the callee pops the parameters. ; ---------------------------------------------------------------------------- global _main extern _glClear@4 extern _glBegin@4 extern _glEnd@0 extern _glColor3f@12 extern _glVertex3f@12 extern _glFlush@0 extern _glutInit@8 extern _glutInitDisplayMode@4 extern _glutInitWindowPosition@8 extern _glutInitWindowSize@8 extern _glutCreateWindow@4 extern _glutDisplayFunc@4 extern _glutMainLoop@0 section .text title: db 'A Simple Triangle', 0 zero: dd 0.0 one: dd 1.0 half: dd 0.5 neghalf:dd -0.5 display: push dword 16384 call _glClear@4 ; glClear(GL_COLOR_BUFFER_BIT) push dword 9 call _glBegin@4 ; glBegin(GL_POLYGON) push dword 0 push dword 0 push dword [one] call _glColor3f@12 ; glColor3f(1, 0, 0) push dword 0 push dword [neghalf] push dword [neghalf] call _glVertex3f@12 ; glVertex(-.5, -.5, 0) push dword 0 push dword [one] push dword 0 call _glColor3f@12 ; glColor3f(0, 1, 0) push dword 0 push dword [neghalf] push dword [half] call _glVertex3f@12 ; glVertex(.5, -.5, 0) push dword [one] push dword 0 push dword 0 call _glColor3f@12 ; glColor3f(0, 0, 1) push dword 0 push dword [half] push dword 0 call _glVertex3f@12 ; glVertex(0, .5, 0) call _glEnd@0 ; glEnd() call _glFlush@0 ; glFlush() ret _main: push dword [esp+8] ; push argv lea eax, [esp+8] ; get addr of argc (offset changed :-) push eax call _glutInit@8 ; glutInit(&argc, argv) push dword 0 call _glutInitDisplayMode@4 push dword 80 push dword 80 call _glutInitWindowPosition@8 push dword 300 push dword 400 call _glutInitWindowSize@8 push dword title call _glutCreateWindow@4 push dword display call _glutDisplayFunc@4 call _glutMainLoop@0 ret |
After entering a function, we can reserve space for local variables by decrementing the stack pointer. For example, the C function
int example(int x, int y) {
int a, b, c;
b = 7;
return x * b + y;
}
can be translated as follows:
_example: sub esp, 12 ; make room for 3 ints mov dword [esp+4], 7 ; b = 7 mov eax, [esp+16] ; x imul eax, [esp+4] ; x * b add eax, [esp+20] ; x * b + y ret
After "sub esp, 12" the stack looks like:
+---------+
esp | a |
+---------+
esp+4 | b |
+---------+
esp+8 | c |
+---------+
esp+12 | retaddr |
+---------+
esp+16 | x |
+---------+
esp+20 | y |
+---------+
Sometimes it is a real pain to try to keep track of the offsets of your parameters and local variables because the stack pointer keeps changing. For example, in
int example(int x, int y) {
int a, b, c;
...
f(y, a, b, b, x);
...
}
you cannot translate the function call as
push dword [esp+16] push dword [esp+4] ; WRONG! b is really now at [esp+8] push dword [esp+4] ; WRONG! b is really now at [esp+12] push dword [esp] ; WRONG! a is really now at [esp+12] push dword [esp+20] ; WRONG! y is really now at [esp+36] call f
For this reason, many functions use the ebp register to index the "stack frame" of local variables and parameters, like this:
push ebp ; must save old ebp mov ebp, esp ; point ebp to this frame sub esp, ___ ; make space for locals ... mov esp, ebp ; clean up locals pop ebp ; restore old ebp ret
As long as you never change ebp throughout the function, all your local variables and parameters will always be at the same offset from ebp. The stack frame for our example function is now:
+---------+
ebp-12 | a |
+---------+
ebp-8 | b |
+---------+
ebp-4 | c |
+---------+
ebp | old ebp |
+---------+
ebp+4 | retaddr |
+---------+
ebp+8 | x |
+---------+
ebp+12 | y |
+---------+